safety test
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)
- Government > Regional Government > North America Government > United States Government > FDA (0.46)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (2 more...)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Government (1.00)
- Information Technology (0.93)
- North America > United States (0.28)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.96)
- Government (0.68)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)
- Government > Regional Government > North America Government > United States Government > FDA (0.46)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Government (1.00)
- Information Technology (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
- Information Technology > Data Science (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- North America > Canada > Alberta (0.14)
- North America > United States > Massachusetts (0.04)
- Europe (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
- Government > Military (0.69)
- Government > Regional Government (0.68)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)
ChatGPT offered bomb recipes and hacking tips during safety tests
A ChatGPT model gave researchers detailed instructions on how to bomb a sports venue – including weak points at specific arenas, explosives recipes and advice on covering tracks – according to safety testing carried out this summer. OpenAI's GPT-4.1 also detailed how to weaponise anthrax and how to make two types of illegal drugs. The testing was part of an unusual collaboration between OpenAI, the 500bn artificial intelligence start-up led by Sam Altman, and rival company Anthropic, founded by experts who left OpenAI over safety fears. Each company tested the other's models by pushing them to help with dangerous tasks. The testing is not a direct reflection of how the models behave in public use, when additional safety filters apply.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.73)
Review for NeurIPS paper: Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms
Weaknesses: W1: The study seems to focus too much on algorithms that are based on safety tests. I understand that the analysis is not compatible, but maybe that would be worth it to include studies on how easy it is to trick those algorithms too. More generally (even for IS algorithms), it was a bit odd to me that the study does not consider attacks on the way pi_e is chosen. W2: It's unclear to me whether the trajectory must still have been performed in the real environment, or it can be completely be made up (but then its value has to be within the range [0,1]). Also, with model based methods (for both environment and policy models), it might be possible to single out the few trajectories that are inconsistent with the other trajectories.
CSPI-MT: Calibrated Safe Policy Improvement with Multiple Testing for Threshold Policies
Cho, Brian M, Pop, Ana-Roxana, Gan, Kyra, Corbett-Davies, Sam, Nir, Israel, Evnine, Ariel, Kallus, Nathan
When modifying existing policies in high-risk settings, it is often necessary to ensure with high certainty that the newly proposed policy improves upon a baseline, such as the status quo. In this work, we consider the problem of safe policy improvement, where one only adopts a new policy if it is deemed to be better than the specified baseline with at least pre-specified probability. We focus on threshold policies, a ubiquitous class of policies with applications in economics, healthcare, and digital advertising. Existing methods rely on potentially underpowered safety checks and limit the opportunities for finding safe improvements, so too often they must revert to the baseline to maintain safety. We overcome these issues by leveraging the most powerful safety test in the asymptotic regime and allowing for multiple candidates to be tested for improvement over the baseline. We show that in adversarial settings, our approach controls the rate of adopting a policy worse than the baseline to the pre-specified error level, even in moderate sample sizes. We present CSPI and CSPI-MT, two novel heuristics for selecting cutoff(s) to maximize the policy improvement from baseline. We demonstrate through both synthetic and external datasets that our approaches improve both the detection rates of safe policies and the realized improvement, particularly under stringent safety requirements and low signal-to-noise conditions.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Israel (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
Warning that robot lawnmowers are killing hedgehogs: Scientists propose must-have garden gadgets come with 'safety certificates'
Hedgehogs are increasingly being killed and injured from encounters with robot lawnmowers which have few safety features to protect wildlife, according to Oxford University scientists. Researchers conducted a series of tests with the mowers, the latest must-have garden gadget, with a view to create a'hedgehog friendly' certification so gardeners need not fear any prickly casualties when they trim the grass. To ensure no harm was caused to living hedgehogs, scientists used rubber'crash test hedgehogs' instead to see if the robot mower would turn away on encountering one of Mrs Tiggywinkle's tribe on the lawn. Hedgehogs are already in serious decline, with reasons including habitat loss, road traffic accidents, intensive agriculture, and injuries from dog bites and garden strimmers. But now mowers are adding to the threats.